Neural-statistical Model of Tata-box Motifs in Eukaryotes
نویسندگان
چکیده
The TATA-box is one of the most important binding sites in eukaryotic Polymerase II promoters. It is also one of the most common motifs in these promoters. The TATA-box is responsible mainly for the proper localization of the transcription start site (TSS) by the biochemical mechanism of DNA transcription. It also has very regular distances from the TSS. Accurate computational recognition of the TATA-box can improve the accuracy of the determination of the TSS location by computer algorithms. The conventional recognition model of the TATAbox in DNA sequence analysis is based on the use of a position weight matrix (PWM). The PWM model of the TATA-box is widely used in promoter recognition programs. This chapter presents a different, nonlinear, recognition model of this motif, based on a combination of statistical and neural network modelling. The resulting TATA-box model uses “statistical filtering” and two LVQ neural networks. The model is derived for a sensitivity level that corresponds to approximately correct recognition of TATA motifs. The system is tested on an independent data set used in the evaluation study by Fickett and Hatzigeorgiou, and it performs better in promoter recognition than three other methods, including the one based on the matching score of the TATA-box PWM of Bucher.
منابع مشابه
Genetic algorithms and extraction of rules for detection of short DNA motifs
The paper presents a method for discovery of speciÞc types of rules related to detection and extraction of explicit potentially biologically active DNA motifs from nucleotide databases. The characteristic of these rules is that they represent a relation of the strengths of signals of two motifs and their mutual distance. The rule extraction is based on a genetic algorithm. The method is applied...
متن کاملDNA structural features of eukaryotic TATA‐containing and TATA‐less promoters
Eukaryotic genes can be broadly classified as TATA-containing and TATA-less based on the presence of TATA box in their promoters. Experiments on both classes of genes have revealed a disparity in the regulation of gene expression and cellular functions between the two classes. In this study, we report characteristic differences in promoter sequences and associated structural properties of the t...
متن کاملTranscription from a TATA-less promoter requires a multisubunit TFIID complex.
In eukaryotes, the TATA box-binding protein (TBP) is responsible for nucleating assembly of the transcription initiation machinery. Here, we report that a TFIID complex containing TBP is essential for transcription even at a promoter that lacks a TATA box. Immunopurification of TFIID reveals that the active species in reconstituting TATA-less transcription is a multisubunit complex consisting o...
متن کاملComparing the Performance of Several Popular Machine Learning Algorithms on Classifying TATA-box from putative TATA box
A TATA box is a common transcription binding site that occurs in the upstream of a transcription start site of many genes. Identifying a TATA box accurately is important since it has been shown empirically that a transcription start site (TSS) occurs in the downstream of a TATA box after a fixed distance that is only dependent on the species. Unfortunately, many substrings of a DNA sequence fit...
متن کاملPredominant gain of promoter TATA box after gene duplication associated with stress responses.
TATA box, the core promoter element, exists in a broad range of eukaryotes, and the expression of TATA-containing genes usually responds to various environmental stresses. Hence, the evolution of TATA-box in duplicate genes may provide some clues for the interrelationship among environmental stress, expression differentiation, and duplicate gene preservation. In the present study, we observed t...
متن کامل